style vector
- North America > United States (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.82)
- Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.67)
- North America > United States (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
Navigating the Synchrony-Stability Frontier in Adaptive Chatbots
Adaptive chatbots that mimic a user's linguistic style can build rapport and engagement, yet unconstrained mimicry risks an agent that feels unstable or sycophantic. We present a computational evaluation framework that makes the core design tension explicit: balancing moment-to-moment linguistic synchrony against long-term persona stability. Using an 8-dimensional style vector and a closed-loop "base+delta" prompting architecture, we simulate and compare explicit adaptation policies - Uncapped, Cap, Exponential Moving Average (EMA), Dead-Band, and Hybrids - on a human-log dataset. Our analysis maps a clear Pareto frontier: bounded policies achieve substantial gains in stability at a modest cost to synchrony. For example, a Hybrid (EMA+Cap) raises stability from 0.542 to 0.878 (+62%) while reducing synchrony by only 17%. We confirm this trade-off through large-scale replications on three public corpora (DailyDialog, Persona-Chat, EmpatheticDialogues) and LLM-in-the-loop validation across two model families. Furthermore, we quantify "prompt legibility," showing that frontier policies reduce instruction churn and cut jarring register flips (major tone changes) from 0.254 to 0.092, yielding systems that are easier to reason about and maintain. Taken together, our framework provides a general evaluation harness for style adaptation; a systematic ablation that identifies Pareto-efficient policies; robust validation across diverse datasets and models; and novel legibility metrics linking policy choices to system maintainability.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Ireland (0.04)
- North America > United States > Minnesota (0.04)
- (13 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework
Fu, Siming, Dong, Sijun, Meng, Xiaoliang
Despite the remarkable success of Self-Supervised Learning (SSL), its generalization is fundamentally hindered by Shortcut Learning, where models exploit superficial features like texture instead of intrinsic structure. We experimentally verify this flaw within the generative paradigm (e.g., MAE) and argue it is a systemic issue also affecting discriminative methods, identifying it as the root cause of their failure on unseen domains. While existing methods often tackle this at a surface level by aligning or separating domain-specific features, they fail to alter the underlying learning mechanism that fosters shortcut dependency. To address this at its core, we propose HyGDL (Hybrid Generative-Discriminative Learning Framework), a hybrid framework that achieves explicit content-style disentanglement. Our approach is guided by the Invariance Pre-training Principle: forcing a model to learn an invariant essence by systematically varying a bias (e.g., style) at the input while keeping the supervision signal constant. HyGDL operates on a single encoder and analytically defines style as the component of a representation that is orthogonal to its style-invariant content, derived via vector projection. This is operationalized through a synergistic design: (1) a self-distillation objective learns a stable, style-invariant content direction; (2) an analytical projection then decomposes the representation into orthogonal content and style vectors; and (3) a style-conditioned reconstruction objective uses these vectors to restore the image, providing end-to-end supervision. Unlike prior methods that rely on implicit heuristics, this principled disentanglement allows HyGDL to learn truly robust representations, demonstrating superior performance on benchmarks designed to diagnose shortcut learning. Self-Supervised Learning (SSL) has recently emerged as a dominant paradigm in representation learning (Grill et al., 2020; Chen et al., 2020a; Sim eoni et al., 2025; Gui et al., 2024; He et al., 2020). Consequently, a significant body of research has aimed to enhance its domain generalization, often by addressing the model's well-documented texture bias (Geirhos et al., 2019). We argue, however, that such approaches often treat the symptom rather than the cause. In this work, we posit that poor generalization stems from a more fundamental problem: the inherent tendency of models towards Shortcut Learning (Geirhos et al., 2020), wherein they exploit superficial features (e.g., texture) that are spuriously correlated with the learning objective, instead of learning the intrinsic, generalizable structure of the data.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > Switzerland (0.04)
Personalized Text Generation with Contrastive Activation Steering
Zhang, Jinghao, Liu, Yuting, Wang, Wenjie, Liu, Qiang, Wu, Shu, Wang, Liang, Chua, Tat-Seng
Personalized text generation aims to infer users' writing style preferences from their historical texts and generate outputs that faithfully reflect these stylistic characteristics. Existing solutions primarily adopt two paradigms: retrieval-augmented generation (RAG) and parameter-efficient fine-tuning (PEFT). While these approaches have advanced the field, they suffer from two critical limitations: (1) the entanglement of content semantics and stylistic patterns in historical texts impedes accurate modeling of user-specific writing preferences; and (2) scalability challenges arising from both RAG's inference latency by retrieval operations and PEFT's parameter storage requirements for per user model. To overcome these limitations, we propose StyleVector, a training-free framework that disentangles and represents personalized writing style as a vector in LLM's activation space, enabling style-steered generation during inference without requiring costly retrieval or parameter storage. Comprehensive experiments demonstrate that our framework achieves a significant 8% relative improvement in personalized generation while reducing storage requirements by 1700 times over PEFT method.
- North America > United States (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China (0.04)
- (2 more...)
- Information Technology (0.46)
- Education (0.46)
Generative Modeling of Individual Behavior at Scale
Omi, Nabil, Caccia, Lucas, Sarkar, Anurag, Ash, Jordan T., Sen, Siddhartha
There has been a growing interest in using AI to model human behavior, particularly in domains where humans interact with this technology. While most existing work models human behavior at an aggregate level, our goal is to model behavior at the individual level. Recent approaches to behavioral stylometry -- or the task of identifying a person from their actions alone -- have shown promise in domains like chess, but these approaches are either not scalable (e.g., fine-tune a separate model for each person) or not generative, in that they cannot generate actions. We address these limitations by framing behavioral stylometry as a multi-task learning problem -- where each task represents a distinct person -- and use parameter-efficient fine-tuning (PEFT) methods to learn an explicit style vector for each person. Style vectors are generative: they selectively activate shared "skill" parameters to generate actions in the style of each person. They also induce a latent space that we can interpret and manipulate algorithmically. In particular, we develop a general technique for style steering that allows us to steer a player's style vector towards a desired property. We apply our approach to two very different games, at unprecedented scales: chess (47,864 players) and Rocket League (2,000 players). We also show generality beyond gaming by applying our method to image generation, where we learn style vectors for 10,177 celebrities and use these vectors to steer their images.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
- North America > Dominican Republic (0.04)
- (3 more...)
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Liu, Rui, Jia, Zhenqi, Bao, Feilong, Li, Haizhou
Conversational speech synthesis (CSS) aims to take the current dialogue (CD) history as a reference to synthesize expressive speech that aligns with the conversational style. Unlike CD, stored dialogue (SD) contains preserved dialogue fragments from earlier stages of user-agent interaction, which include style expression knowledge relevant to scenarios similar to those in CD. Note that this knowledge plays a significant role in enabling the agent to synthesize expressive conversational speech that generates empathetic feedback. However, prior research has overlooked this aspect. To address this issue, we propose a novel Retrieval-Augmented Dialogue Knowledge Aggregation scheme for expressive CSS, termed RADKA-CSS, which includes three main components: 1) To effectively retrieve dialogues from SD that are similar to CD in terms of both semantic and style. First, we build a stored dialogue semantic-style database (SDSSD) which includes the text and audio samples. Then, we design a multi-attribute retrieval scheme to match the dialogue semantic and style vectors of the CD with the stored dialogue semantic and style vectors in the SDSSD, retrieving the most similar dialogues. 2) To effectively utilize the style knowledge from CD and SD, we propose adopting the multi-granularity graph structure to encode the dialogue and introducing a multi-source style knowledge aggregation mechanism. 3) Finally, the aggregated style knowledge are fed into the speech synthesizer to help the agent synthesize expressive speech that aligns with the conversational style. We conducted a comprehensive and in-depth experiment based on the DailyTalk dataset, which is a benchmarking dataset for the CSS task. Both objective and subjective evaluations demonstrate that RADKA-CSS outperforms baseline models in expressiveness rendering. Code and audio samples can be found at: https://github.com/Coder-jzq/RADKA-CSS.
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > Mongolia (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
Komiya, Kazuma, Fukuhara, Yoshihisa
There have been several studies on automatically generating piano covers, and recent advancements in deep learning have enabled the creation of more sophisticated covers. However, existing automatic piano cover models still have room for improvement in terms of expressiveness and fidelity to the original. To address these issues, we propose a learning algorithm called AMT-APC, which leverages the capabilities of automatic music transcription models. By utilizing the strengths of well-established automatic music transcription models, we aim to improve the accuracy of piano cover generation. Our experiments demonstrate that the AMT-APC model reproduces original tracks more accurately than any existing models.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Li, Yinghao Aaron, Jiang, Xilin, Darefsky, Jordan, Zhu, Ge, Mesgarani, Nima
The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resources required. The conventional approach of cascading automatic speech recognition (ASR), LLM, and text-to-speech (TTS) models in a pipeline, while effective, suffers from unnatural prosody because it lacks direct interactions between the input audio and its transcribed text and the output audio. These systems are also limited by their inherent latency from the ASR process for real-time applications. This paper introduces Style-Talker, an innovative framework that fine-tunes an audio LLM alongside a style-based TTS model for fast spoken dialog generation. Style-Talker takes user input audio and uses transcribed chat history and speech styles to generate both the speaking style and text for the response. Subsequently, the TTS model synthesizes the speech, which is then played back to the user. While the response speech is being played, the input speech undergoes ASR processing to extract the transcription and speaking style, serving as the context for the ensuing dialogue turn. This novel pipeline accelerates the traditional cascade ASR-LLM-TTS systems while integrating rich paralinguistic information from input speech. Our experimental results show that Style-Talker significantly outperforms the conventional cascade and speech-to-speech baselines in terms of both dialogue naturalness and coherence while being more than 50% faster.
- North America > United States > New York > Monroe County > Rochester (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- Asia > Middle East > Jordan (0.04)
Style Vectors for Steering Generative Large Language Model
Konen, Kai, Jentzsch, Sophie, Diallo, Diaoulé, Schütt, Peer, Bensch, Oliver, Baff, Roxanne El, Opitz, Dominik, Hecking, Tobias
This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activations of hidden layers during text generation. We show that style vectors can be simply computed from recorded layer activations for input texts in a specific style in contrast to more complex training-based approaches. Through a series of experiments, we demonstrate the effectiveness of activation engineering using such style vectors to influence the style of generated text in a nuanced and parameterisable way, distinguishing it from prompt engineering. The presented research constitutes a significant step towards developing more adaptive and effective AI-empowered interactive systems.
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Ireland (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)